Skip to main content

Object Tracking Service System Flow

The tracking system works by continuously updating an internal state for each tracked object every time new inference data arrives. Conceptually, it operates as a “tracking-by-detection” pipeline:

  • Detection Input: The tracker takes as input the inference outputs from an object detection model (or any model that provides object regions/locations). For example, in a video, each frame might produce a set of bounding boxes with class labels and confidence scores. These detections serve as observations for the tracker.

  • State Prediction (Motion Modeling): For each existing track (object being tracked), the tracker predicts where that object should be in the new frame based on its previous state. Under the hood, the tracker uses a motion model (by default, a linear model such as constant velocity) implemented via a Kalman filter. The Kalman filter projects each track's last known position, velocity, and acceleration, providing a predicted location and an uncertainty region. This helps account for object movement between inferences and provides a degree of noise smoothing.

  • Data Association: Once predictions are made, the tracker must associate new detections with existing tracks–that is, deciding which detection corresponds to which track. The tracker calculates a cost or similarity between every predicted track position and each new detection. This cost is based on how far apart they are or how much their bounding boxes overlap. For instance, the intersection-over-union (IoU) overlap between predicted and detected bounding boxes is a simple and effective cost metric. When tracking point coordinates, distance measures (Euclidean or Haversine for geo-coordinates) are used instead. The object tracking service allows configuration of the association metric through the assignment function setting (described later).

  • Gating: Before attempting any match, the tracker applies a gating threshold to eliminate improbable matches. Gating means that if a detection is too far from a track's predicted position (cost exceeds a threshold), then that detection is not considered a valid match for that track. This prevents random or distant detections from erroneously linking to a track. For example, you might configure a maximum distance or minimum IoU that must be met; any pair failing this gating constraint is ignored in the association step. Gating creates a "window" around each predicted position where a detection must fall to be a candidate.

  • Assignment: After gating, the tracker solves the assignment problem for the remaining feasible matches. Chariot's tracker uses a one-to-one linear assignment method (the Hungarian algorithm) to find the optimal matching between tracks and detections. This algorithm takes the matrix of pairwise costs (for example, 1 − IoU as a cost or Euclidean distance) and finds the pairing of tracks to detections that yields the minimal total cost. This ensures that each track is matched to, at most, one detection and vice versa. The result of this step is that some tracks will be:

  • Some tracks will be matched to new detections (meaning the object was seen again roughly where expected)
  • Other detections may remain unassigned (no existing track)–those likely represent new objects appearing
  • Some tracks may remain unassigned (no detection matched–meaning the object might have disappeared or been occluded)
  • Track State Update: For each track that was successfully matched with a detection, the tracker updates the track's state using the new observation. This involves updating the object's last known position with the detected position (and updating internal state estimates). The Kalman filter is used in the update step to refine the state estimate given the new measurement, yielding a smoother trajectory and updated covariance (uncertainty).

  • New Track Creation: Any detection in the new inference that was not matched to an existing track is assumed to represent a new object entering the scene. In these cases, the object tracking service will initialize a new track for each such detection. A new track ID is generated, and the detection's position becomes the start of a new track. Initially, new tracks may be in a tentative state (depending on configuration) until they persist for a few frames.

  • Track Termination (Life Cycle Management): For any existing track that was not matched with a new detection in the current update, the tracker will mark that track as "missed" for this cycle. If a track misses detections repeatedly, it likely means the object has left the scene or is occluded for an extended period. The object tracking service uses a consecutive missed updates counter for tracks. If a track has been unmatched for more than a configurable number of consecutive updates, the track is considered lost and will be removed. This prevents stale tracks from hanging around indefinitely when objects disappear. The max_missing_updates setting (see below) controls how many missed updates are allowed. If the object reappears after the track was terminated, it will be treated as a new track with a new ID (i.e., the identity is not persisted once a track is lost).

  • Track Output: After each update cycle, the tracker will output the list of tracks. Each track output includes its track ID, current position (e.g., bounding box or point coordinates), the latest associated label and confidence score, and other metadata. You can query the Inference Store for these outputs via the SDK (or view them in the UI when integrated). This allows you to know, for example, that "Track ID 5 corresponds to the car currently at [x,y] with bounding box [w,h]." Track outputs are essentially an aggregated, higher-level view of the inference stream, where transient detection IDs are replaced by persistent track IDs.

Summary of an Update Cycle: In simpler terms, whenever new inference data comes in, the tracker (1) predicts existing tracks' positions (using last state + motion model), (2) finds which new detections match which tracks (using gating and Hungarian algorithm), (3) updates matched tracks with new info, (4) creates new tracks for unmatched detections, and (5) removes any tracks that have gone too long without a match. By repeating this cycle, the service maintains continuous identities for objects through time.